Adaptive Informative Sampling for Active Learning

نویسندگان

  • Zhenyu Lu
  • Xindong Wu
  • Josh C. Bongard
چکیده

Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence. An alternative approach is to periodically choose data instances that maximize disagreement among the label predictions across an ensemble of classifiers. Many classifiers with different underlying structures could fit this framework, but some ensembles are more suitable for some data sets than others. The question then arises as to how to find the most suitable ensemble for a given data set. In this work we introduce a method that begins with a heterogeneous ensemble composed of multiple instances of different classifier types, which we call adaptive informative sampling (AIS). The algorithm periodically adds data points to the training set, adapts the ratio of classifier types in the heterogeneous ensemble in favor of the better classifier type, and optimizes the classifiers in the ensemble using stochastic methods. Experimental results show that the proposed method performs consistently better than homogeneous ensembles. Comparison with random sampling and uncertainty sampling shows that the algorithm effectively draws informative data points for training.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

Learning ranking functions is crucial for solving many problems, ranging from document retrieval to building recommendation systems based on an individual user’s preferences or on collaborative filtering. Learning-to-rank is particularly necessary for adaptive or personalizable tasks, including email prioritization, individualized recommendation systems, personalized news clipping services and ...

متن کامل

Topology-Based Active Learning

A common problem in simulation and experimental research involves obtaining time-consuming, expensive, or potentially hazardous samples from an arbitrary dimension parameter space. For example, many simulations modeled on supercomputers can take days or weeks to complete, so it is imperative to select samples in the most informative and interesting areas of the parameter space. In such environm...

متن کامل

Fast Interactive Image Retrieval using large-scale unlabeled data

An interactive image retrieval system learns which images in the database belong to a user’s query concept, by analyzing the example images and feedback provided by the user. The challenge is to retrieve the relevant images with minimal user interaction. In this work, we propose to solve this problem by posing it as a binary classification task of classifying all images in the database as being...

متن کامل

Active Learning Query Selection with Historical Information

This work describes novel methods and techniques to decrease the cost of employing active learning in text categorisation problems. The cost of performing active learning is a combination of labelling effort and computational overhead. Reducing the cost of active learning allows for accurate classifiers to be constructed inexpensively, increasing the number of realworld problems where machine l...

متن کامل

Upper and Lower Error Bounds for Active Learning

This paper analyzes the potential advantages and theoretical challenges of ”active learning” algorithms. Active learning involves sequential, adaptive sampling procedures that use information gleaned from previous samples in order to focus the sampling and accelerate the learning process relative to “passive learning” algorithms, which are based on non-adaptive (usually random) samples. There a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010